Query classification using Wikipedia

نویسنده

  • Richard Khoury
چکیده

Identifying the intended topic that underlies a user’s query can benefit a large range of applications, from search engines to question-answering systems. However, query classification remains a difficult challenge due to the variety of queries a user can ask, the wide range of topics users can ask about, and the limited amount of information that can be mined from the query. In this paper, we develop a new query classification system that accounts for these three challenges. Our system relies on the freely-available online encyclopaedia Wikipedia as a natural-language knowledge-based, and exploits Wikipedia’s structure to infer the correct classification of any given query. We will present two variants of this query classification system in this paper, and demonstrate their reliability compared to each other and to the literature benchmarks using the query sets from the KDD CUP 2005 and TREC 2007 competitions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore

Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...

متن کامل

Using Encyclopaedic Knowledge for Query Classification

Identifying the intended topic that underlies a user’s query can benefit a large range of applications, from search engines to question-answering systems. However, query classification remains a difficult challenge due to the variety of queries a user can ask, the wide range of topics users can ask about, and the limited amount of information that can be mined from the query. In this paper, we ...

متن کامل

Wikipedia-based Unsupervised Query Classification

In this paper we present an unsupervised approach to Query Classification. The approach exploits the Wikipedia encyclopedia as a corpus and the statistical distribution of terms, from both the category labels and the query, in order to select an appropriate category. We have created a classifier that works with 55 categories extracted from the search section of the Bridgeman Art Library website...

متن کامل

Blind Relevance Feedback with Wikipedia: Enterprise Track

In this year’s Enterprise track experiment, we focused on testing Blind Relevance Feedback, especially using online Wikipedia as query expansion collection. We demonstrated that using Wikipedia as query expansion collection returns better infNDCG than not using it.

متن کامل

"Piaf" vs "Adele": classifying encyclopedic queries using automatically labeled training data

Encyclopedic queries express the intent of obtaining information typically available in encyclopedias, such as biographical, geographical or historical facts. In this paper, we train a classifier for detecting the encyclopedic intent of web queries. For training such a classifier, we automatically label training data from raw query logs. We use click-through data to select positive examples of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJIIDS

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2011